Segmentation Standard for Chinese Natural Language Processing
نویسندگان
چکیده
This paper proposes a segmentation standard for Chinese natural language processing. The standard is proposed to achieve linguistic felicity, computational feasibility, and data uniformity. Linguistic felicity is maintained by defining a segmentation unit to be equivalent to the theoretical definition of word, and by providing a set of segmentation principles that are equivalent to a functional definition of a word. Computational feasibility is ensured by the fact that the above functional definitions are procedural in nature and can be converted to segmentation algorithms, as well as by the implementable heuristic guidelines which deal with specific linguistic categories. Data uniformity is achieved by stratification of the standard itself and by defining a standard lexicon as part of the segmentation standard.
منابع مشابه
A Chinese word segmentation based on language situation in processing ambiguous words
While the processing of natural language is beneficial to the text mining, Chinese word segmentation is an important step in the processing of Chinese natural language. In this paper, the convergence essence of the segmentation process is analyzed, and a theory of Chinese word segmentation based on language situation is deducted. Based on the segmentation theory, an algorithm of Chinese word se...
متن کاملText Window Denoising Autoencoder: Building Deep Architecture for Chinese Word Segmentation
Deep learning is the new frontier of machine learning research, which has led to many recent breakthroughs in English natural language processing. However, there are inherent differences between Chinese and English, and little work has been done to apply deep learning techniques to Chinese natural language processing. In this paper, we propose a deep neural network model: text window denoising ...
متن کاملA New Psychometric-inspired Evaluation Metric for Chinese Word Segmentation
Word segmentation is a fundamental task for Chinese language processing. However, with the successive improvements, the standard metric is becoming hard to distinguish state-of-the-art word segmentation systems. In this paper, we propose a new psychometric-inspired evaluation metric for Chinese word segmentation, which addresses to balance the very skewed word distribution at different levels o...
متن کاملFudanNLP: A Toolkit for Chinese Natural Language Processing
The growing need for Chinese natural language processing (NLP) is largely in a range of research and commercial applications. However, most of the currently Chinese NLP tools or components still have a wide range of issues need to be further improved and developed. FudanNLP is an open source toolkit for Chinese natural language processing (NLP), which uses statistics-based and rule-based method...
متن کاملNetEase Automatic Chinese Word Segmentation
This document analyses the bakeoff results from NetEase Co. in the SIGHAN5 Word Segmentation Task and Named Entity Recognition Task. The NetEase WS system is designed to facilitate research in natural language processing and information retrieval. It supports Chinese and English word segmentation, Chinese named entity recognition, Chinese part of speech tagging and phrase conglutination. Evalua...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJCLCLP
دوره 2 شماره
صفحات -
تاریخ انتشار 1996